feat(node): add ClickHouse database node (db_postgres clone)#1064
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds a ClickHouse database node: implements ClickHouse-specific IGlobal for normalized connection params and native DSN construction (with TLS/port logic), registers the node in services.json, pins ClickHouse deps, provides package/init, README, IInstance, and unit tests for DSN and retry parsing. ChangesClickHouse Database Node
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No description provided. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 67-85: The _build_connection_url function currently only
URL-encodes params['password'], leaving params['user'] and params['database']
unescaped which can break the DSN when they contain reserved characters; update
_build_connection_url to quote_plus both params['user'] and params['database']
(in addition to params['password']) and use the quoted values in both the TLS
branch (return in the if params.get('tls') block) and the non-TLS return path so
the constructed clickhouse+native://{user}:{password}@{host}/{database} string
is always safe; apply the same change pattern to the corresponding
_build_connection_url implementations in db_postgres/IGlobal.py and
db_mysql/IGlobal.py for parity.
In `@nodes/src/nodes/db_clickhouse/services.json`:
- Line 50: The current "description" in services.json describes the node as an
insert-only sink; update the description string for the node (the "description"
property in nodes/src/nodes/db_clickhouse/services.json) to reflect its primary
role as a query/agent tool that translates questions to SQL and executes queries
(question→SQL execution) and exposes the agent tool surface methods get_data,
get_schema and get_sql, while also noting it can insert or map fields for
ingestion—make the wording UI-friendly and accurate so users understand both
querying and optional ingestion capabilities.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4ac54fc4-c1ae-4979-a106-289d32bf2ae7
⛔ Files ignored due to path filters (1)
nodes/src/nodes/db_clickhouse/clickhouse.svgis excluded by!**/*.svg
📒 Files selected for processing (8)
nodes/src/nodes/db_clickhouse/IGlobal.pynodes/src/nodes/db_clickhouse/IInstance.pynodes/src/nodes/db_clickhouse/README.mdnodes/src/nodes/db_clickhouse/__init__.pynodes/src/nodes/db_clickhouse/requirements.txtnodes/src/nodes/db_clickhouse/services.jsonnodes/test/db_clickhouse/__init__.pynodes/test/db_clickhouse/test_clickhouse_dsn.py
Adds a dedicated ClickHouse node, structured as a thin dialect clone of the existing db_mysql / db_postgres nodes — connection params + DSN builder are the only ClickHouse-specific code; schema reflection, NL->SQL, EXPLAIN validation, SELECT-only safety, and insertion are inherited unchanged from ai.common.database. Dual role (classType ["database","tool"]): - pipeline node: questions -> SQL -> execute; answers/table -> insert - agent tool: clickhouse.get_data / get_schema / get_sql Driver: clickhouse-sqlalchemy native TCP (clickhouse-driver), port 9000. ClickHouse-only extra: a `tls` toggle (distinct from the shared password-field "secure" attribute) that switches the DSN to TLS and assumes the ClickHouse Cloud native port 9440 — verified against a local server and ClickHouse Cloud. Read-only by default; raw SQL gated behind allow_execute, matching MySQL/PostgreSQL. Includes DSN unit tests (nodes/test/db_clickhouse). Fixes rocketride-org#1051 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9153a33 to
4186f3a
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
nodes/src/nodes/db_clickhouse/IGlobal.py (1)
67-85:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winEncode
useranddatabasein the DSN too.Only escaping
passwordstill leaves valid usernames/databases with reserved characters able to break the SQLAlchemy URL. This was already raised on an earlier revision and is still present.Suggested fix
def _build_connection_url(self, params: Dict[str, str]) -> str: # URL-encode the password so special characters (e.g. @, /, #) don't # break the SQLAlchemy connection string. + user = urllib.parse.quote_plus(params['user']) password = urllib.parse.quote_plus(params['password']) + database = urllib.parse.quote_plus(params['database']) @@ - return f'clickhouse+native://{params["user"]}:{password}@{host}/{params["database"]}?secure=true' + return f'clickhouse+native://{user}:{password}@{host}/{database}?secure=true' @@ - return f'clickhouse+native://{params["user"]}:{password}@{host}/{params["database"]}' + return f'clickhouse+native://{user}:{password}@{host}/{database}'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_clickhouse/IGlobal.py` around lines 67 - 85, The DSN builder in _build_connection_url only URL-encodes params['password'], leaving params['user'] and params['database'] able to break the SQLAlchemy URL; URL-encode both the username and database (e.g., via urllib.parse.quote_plus) before interpolating them into the returned clickhouse+native:// string while preserving existing host/port and ?secure=true behavior (i.e., replace direct uses of params["user"] and params["database"] with their quoted equivalents alongside the already-quoted password).nodes/src/nodes/db_clickhouse/services.json (1)
50-50:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the description to match the node’s actual role.
This still reads like an insert-only sink, but the node’s primary surface is question-to-SQL querying plus tool functions. The UI copy will mislead users selecting it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_clickhouse/services.json` at line 50, The current "description" string in services.json describes an insert-only ClickHouse sink; update the "description" value to accurately reflect this node's primary role as a question-to-SQL querying tool with tool functions (e.g., SQL generation, query execution, and optional insertion), mentioning that it translates natural-language questions into SQL, runs queries against ClickHouse, returns results, and can optionally map fields for inserts; locate and edit the "description" array entry in nodes/src/nodes/db_clickhouse/services.json to replace the insert-focused copy with the new concise summary.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 73-78: The TLS default-port logic incorrectly treats IPv6 literals
as having a port because it checks for ':' in the host; update the check used
when params.get('tls') is true so it's bracket-aware: for hosts starting with
'[' (IPv6 literal) detect a port only if there's a ']' followed by ':' (e.g.,
'...[ ]:port'), otherwise for non-bracketed hosts keep the existing ':' check;
if no port is detected append ':9440' to host. Refer to the params.get('tls')
check and the host variable in IGlobal.py and implement the bracket-aware port
detection before adding ':9440'.
- Around line 87-91: The _max_validation_attempts function currently casts
config.get('max_attempts', 5) to int but does not enforce the documented 1–20
bounds; update _max_validation_attempts to parse the value (catching
ValueError/TypeError), then clamp the resulting int into the inclusive range
1..20 and return that clamped value (falling back to 5 on parse failure), so the
'max_attempts' config key cannot produce 0, negatives, or values >20.
- Around line 57-61: The config values for host, user, database, and table are
being .strip()'d without guarding against explicit nulls; update the code that
builds the connection dict (the block creating 'host', 'user', 'password',
'database', 'table' entries in IGlobal.py) to coerce None to the intended
default before calling .strip() — e.g., use a None-safe expression like
(config.get('host') or 'localhost').strip() and similarly for 'user',
'database', and 'table'; also ensure 'password' is coerced to '' when None (but
keep the comment about not stripping).
In `@nodes/src/nodes/db_clickhouse/services.json`:
- Around line 73-76: The manifest currently exposes a write path via the
"answers" ingestion lane in services.json (see the "lanes" object) which lets
the node mutate ClickHouse even when only clickhouse.allow_execute is used;
either remove the "answers" lane from the default manifest or add explicit
write/disposal guards (e.g., "allow_write" and "allow_drop" flags) and wire
those flags into the node's ingestion/mutation path (the code paths that handle
answers ingestion and auto-create/insert behaviour referenced in the same
service and the related entries at lines 158-179) so that write/drop operations
are blocked unless the new toggles are explicitly enabled. Ensure the chosen
approach is applied consistently to the other affected service entries mentioned
(158-179).
In `@nodes/test/db_clickhouse/test_clickhouse_dsn.py`:
- Around line 98-170: Tests are missing regression coverage for three failure
modes: explicit nulls for stripped fields, IPv6 host literals with TLS, and
out-of-range max_attempts; update the test suite to add cases exercising
g._connection_params when fields like user/password/database are None (ensure
they result in empty/stripped values rather than errors), add TLS parsing tests
for IPv6 hosts (e.g. host like "[::1]" and "[::1]:9000") to verify
_build_connection_url preserves brackets and port and appends ?secure=true when
tls is truthy, and add bounds tests for g._max_validation_attempts to assert
that values below the minimum and above the maximum (and None/non-int) fall back
to the allowed range (and are cast to int where appropriate) so implementation
regressions are caught.
---
Duplicate comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 67-85: The DSN builder in _build_connection_url only URL-encodes
params['password'], leaving params['user'] and params['database'] able to break
the SQLAlchemy URL; URL-encode both the username and database (e.g., via
urllib.parse.quote_plus) before interpolating them into the returned
clickhouse+native:// string while preserving existing host/port and ?secure=true
behavior (i.e., replace direct uses of params["user"] and params["database"]
with their quoted equivalents alongside the already-quoted password).
In `@nodes/src/nodes/db_clickhouse/services.json`:
- Line 50: The current "description" string in services.json describes an
insert-only ClickHouse sink; update the "description" value to accurately
reflect this node's primary role as a question-to-SQL querying tool with tool
functions (e.g., SQL generation, query execution, and optional insertion),
mentioning that it translates natural-language questions into SQL, runs queries
against ClickHouse, returns results, and can optionally map fields for inserts;
locate and edit the "description" array entry in
nodes/src/nodes/db_clickhouse/services.json to replace the insert-focused copy
with the new concise summary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c5e53d54-453f-4140-acf4-1fcdae2bd12f
⛔ Files ignored due to path filters (1)
nodes/src/nodes/db_clickhouse/clickhouse.svgis excluded by!**/*.svg
📒 Files selected for processing (8)
nodes/src/nodes/db_clickhouse/IGlobal.pynodes/src/nodes/db_clickhouse/IInstance.pynodes/src/nodes/db_clickhouse/README.mdnodes/src/nodes/db_clickhouse/__init__.pynodes/src/nodes/db_clickhouse/requirements.txtnodes/src/nodes/db_clickhouse/services.jsonnodes/test/db_clickhouse/__init__.pynodes/test/db_clickhouse/test_clickhouse_dsn.py
asclearuc
left a comment
There was a problem hiding this comment.
@mithileshgau Thanks for this — the architecture is clean and the DSN tests are nice and focused.
Value
A first-class ClickHouse node is genuinely useful, and the approach is right:
the heavy logic already lives in ai.common.database, so this PR only adds the
~120 lines of real ClickHouse code (DSN + TLS). Good.
ClickHouse specifics — one blocker
The README and PR description advertise the answers insert lane and
"auto-creates the target table on first insert." That inherited path does not
work on ClickHouse, because the shared _createTableFromData builds the table
with an auto-increment integer primary key and no table engine — neither exists
in ClickHouse.
This is complementary to CodeRabbit's note about allow_execute not gating
the write lane: CodeRabbit asks whether writes should be open; this is that
they do not work. The cleanest fix probably satisfies both at once.
Verdict
Request changes. The node's read/query/tool path looks solid, but the
insert/auto-create path is advertised and broken on ClickHouse. Either disable
it for ClickHouse and update the docs, or implement ClickHouse-correct table
creation. Plus the CodeRabbit items above.
…ported insert lane Addresses review feedback on the ClickHouse node PR. DSN safety (db_clickhouse + db_mysql + db_postgres, for parity): - URL-encode user and database (not just password) in _build_connection_url so reserved characters can't break the SQLAlchemy URL. db_clickhouse hardening: - _connection_params: coerce explicit null host/user/database/table to their defaults before .strip() (was AttributeError on a stored null). - _build_connection_url: bracket-aware port detection for IPv6 literals so a TLS host like [::1] correctly gets :9440 and isn't mis-parsed. - _max_validation_attempts: clamp to the documented 1..20 range. Remove the broken / ungated insert path: - Drop the `answers` ingestion lane from the manifest. The inherited auto-create-table helper uses an auto-increment integer PK and no table engine — neither exists in ClickHouse — so the advertised insert/auto-create path could not work, and it also bypassed allow_execute. Removing the lane resolves both the "broken on ClickHouse" and "ungated write" review notes. - Rewrite the node description (query/agent-tool focus, not insert-only) and update the README (remove insert lane + auto-create claims, add an Ingestion section explaining the deliberate exclusion). Tests: add coverage for null coercion, IPv6+TLS, user/database encoding, and max_attempts clamping (32 DSN unit tests pass). Refs rocketride-org#1051
Thanks @alexandru — great catch on the insert path, that was the right call. Pushed The blocker — insert/auto-create broken on ClickHouseYou're right that the inherited
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 108-109: The _db_description method can return None when config
contains "db_description": null which violates its -> str contract; update
_db_description (the method named _db_description) to coerce null to an empty
string the same way other nullable config fields are handled: read
config['db_description'], if it is None or not a string return '' (or str(value)
if you prefer) and always return a str so callers relying on a string never get
None.
In `@nodes/src/nodes/db_clickhouse/README.md`:
- Around line 69-71: The README's Ingestion section is ambiguous about the
"answers" lane; update the text to explicitly state that the node does not
expose the ingestion/input "answers" lane (used for pipeline inserts) but still
supports the output "questions -> answers" query path; specifically edit the
sentence referencing "this node intentionally does not expose a pipeline
ingestion (`answers`) lane" to add a clarifying clause like "this only removes
the ingestion/input `answers` lane (used for inserts), not the `questions ->
answers` output lane used for querying" so readers understand there is no
contradiction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4d0ed11b-7612-4ce7-99d3-e918e941c6cc
📒 Files selected for processing (6)
nodes/src/nodes/db_clickhouse/IGlobal.pynodes/src/nodes/db_clickhouse/README.mdnodes/src/nodes/db_clickhouse/services.jsonnodes/src/nodes/db_mysql/IGlobal.pynodes/src/nodes/db_postgres/IGlobal.pynodes/test/db_clickhouse/test_clickhouse_dsn.py
…estion note - _db_description: a stored `"db_description": null` returned None, violating the -> str contract. Coerce null / non-string to '' (add regression tests). - README Ingestion: clarify that removing the `answers` lane drops only the ingestion/input lane (pipeline inserts), not the `questions -> answers` output lane used for querying. Refs rocketride-org#1051
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nodes/src/nodes/db_clickhouse/IGlobal.py (1)
60-64:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWhitespace-only connection fields bypass the intended defaults.
Lines 60-64 only fall back before calling
.strip(), so inputs like' 'become''rather than'localhost'/'default'/'table'. That produces invalid connection URLs from otherwise recoverable config and contradicts the normalization comment above this block. Strip first, then apply the default in one helper so whitespace-only and non-string imported values are handled consistently.Proposed fix
class IGlobal(DatabaseGlobalBase): @@ + `@staticmethod` + def _string_param(config: Dict[str, Any], key: str, default: str) -> str: + value = config.get(key) + if value is None: + return default + return str(value).strip() or default + def _connection_params(self, config: Dict[str, Any]) -> Dict[str, str]: @@ return { - 'host': (config.get('host') or 'localhost').strip(), - 'user': (config.get('user') or 'default').strip(), + 'host': self._string_param(config, 'host', 'localhost'), + 'user': self._string_param(config, 'user', 'default'), 'password': config.get('password') or '', # Do not strip — whitespace is valid in passwords - 'database': (config.get('database') or 'default').strip(), - 'table': (config.get('table') or 'table').strip(), + 'database': self._string_param(config, 'database', 'default'), + 'table': self._string_param(config, 'table', 'table'), 'tls': 'true' if tls else '', }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/db_clickhouse/IGlobal.py` around lines 60 - 64, Create a small normalization helper in IGlobal.py (e.g., normalize_field(value, default)) that coerces non-None values to str, calls .strip(), and returns the default when the stripped result is empty; then replace the current inline expressions for 'host', 'user', 'database', and 'table' to use normalize_field(config.get('host'), 'localhost') / normalize_field(config.get('user'), 'default') / normalize_field(config.get('database'), 'default') / normalize_field(config.get('table'), 'table'); leave 'password' as (config.get('password') or '') without stripping (per the existing comment) so whitespace-only passwords are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 60-64: Create a small normalization helper in IGlobal.py (e.g.,
normalize_field(value, default)) that coerces non-None values to str, calls
.strip(), and returns the default when the stripped result is empty; then
replace the current inline expressions for 'host', 'user', 'database', and
'table' to use normalize_field(config.get('host'), 'localhost') /
normalize_field(config.get('user'), 'default') /
normalize_field(config.get('database'), 'default') /
normalize_field(config.get('table'), 'table'); leave 'password' as
(config.get('password') or '') without stripping (per the existing comment) so
whitespace-only passwords are preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8f8089b0-3107-4d36-96bd-4ac243fdc7d4
📒 Files selected for processing (3)
nodes/src/nodes/db_clickhouse/IGlobal.pynodes/src/nodes/db_clickhouse/README.mdnodes/test/db_clickhouse/test_clickhouse_dsn.py
- Extract a _normalize_field(value, default) helper used for host/user/ database/table: coerces non-None values via str() before .strip() (so a non-string config value can't raise AttributeError) and falls back to the default when the result is empty/whitespace-only. Password stays uncoerced (whitespace preserved). Adds regression coverage. - Add docstrings to all overridden methods across the three DB nodes (db_clickhouse / db_mysql / db_postgres) to satisfy the docstring-coverage threshold (touched files now 100%). Refs rocketride-org#1051
Summary
db_clickhouse), implemented as a thin dialect clone of the existingdb_mysql/db_postgresnodes over the sharedai.common.databasebase — only the connection params + DSN builder are ClickHouse-specific; schema reflection, NL→SQL, EXPLAIN validation, SELECT-only safety, and insertion are inherited unchanged.classType: ["database","tool"]): pipeline node (questions→SQL→execute; answers/table→insert) and agent tool (clickhouse.get_data/get_schema/get_sql).tlstoggle for TLS connections (ClickHouse Cloud, native port 9440) — kept distinct from the shared password-fieldsecureattribute so the core stays aligned with MySQL/PostgreSQL.Problem
RocketRide had no first-class ClickHouse support. Users wanting analytical Postgres-style querying or a ClickHouse-backed agent tool had no node for it.
Solution
db_clickhouseclones the provendb_postgres/db_mysqlpattern:IGlobalsupplies the ClickHouse connection params and aclickhouse-sqlalchemynative-TCP DSN (clickhouse+native://, port 9000),IInstancesupplies the display name/dialect.clickhouse-sqlalchemyreflects ClickHouse's empty FK list + best-effort PK, so the dialect-agnostic base works as-is. Read-only by default; raw SQL gated behindallow_execute, matching the other DB nodes.Alternatives considered
mcp-clickhouse, or the hostedmcp.clickhouse.cloudremote MCP) — a viable complementary node, the hosted MCP is OAuth-only and would need OAuth support added totool_mcp_client.allow_write/allow_drop+run_querytool surface; it's a base-class change affecting MySQL/PostgreSQL too.Notes / caveats
db_mysql/db_postgresand left for parity(a base-wide fix is the right place):
.strip()on an explicitnullconfig value, andIPv6-host port detection in the new TLS branch. Neither is reachable with realistic
ClickHouse hosts (Cloud hostnames /
localhost).to keep this PR focused on the node.
Type
Feature — new pipeline node.
Testing
nodes/test/db_clickhouse/(22 DSN unit tests)ClickHouse Cloud, both as a pipeline node and as a CrewAI agent tool
./builder testpasses — ran./builder nodes:test-contracts(210 passed) + the newunit tests +
ruff check/format; full C++./builder testnot run locallyChecklist
Linked Issue
Fixes #1051
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores